Unsupervised Coreference of Publication Venues

نویسندگان

  • Robert Hall
  • Charles Sutton
  • Andrew McCallum
چکیده

Information about the venues of research papers is useful for information retrieval and for automatic mining of the literature. Important to processing venue information is venue coreference, the task of determining which possibly dissimilar mentions of venues refer to the same underlying venue. A natural unsupervised technique for this problem is generative mixture modeling, and indeed such models have been successfully applied to paper and author coreference. But standard models perform poorly on venue strings, because venue strings exhibit greater variance than title or author strings. In this paper, we exploit the fact that venues have characteristic distributions over titles. We do this using a generative model that explicitly models a venue-specific distribution over title words. The model uses a single set of latent variables to control two disparate clustering models: a Dirichlet-multinomial model over titles, and a nonexchangeable string-edit model over venues. Incorporating title information yields a substantial improvement in performance—a 58% reduction in error over a standard Dirichlet process mixture. The model successfully disambiguates several venues that have string-identical abbreviations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Chinese Event Coreference Resolution: An Unsupervised Probabilistic Model Rivaling Supervised Resolvers

Recent work has successfully leveraged the semantic information extracted from lexical knowledge bases such as WordNet and FrameNet to improve English event coreference resolvers. The lack of comparable resources in other languages, however, has made the design of high-performance non-English event coreference resolvers, particularly those employing unsupervised models, very difficult. We propo...

متن کامل

Unsupervised Models for Coreference Resolution

We present a generative model for unsupervised coreference resolution that views coreference as an EM clustering process. For comparison purposes, we revisit Haghighi and Klein’s (2007) fully-generative Bayesian model for unsupervised coreference resolution, discuss its potential weaknesses and consequently propose three modifications to their model. Experimental results on the ACE data sets sh...

متن کامل

Unsupervised Ranking Model for Entity Coreference Resolution

Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community. In this paper, we propose a generative, unsupervised ranking model for entity coreference resolution by introducing resolution mode variables. Our unsupervised system achieves 58.44% F1 score of the CoNLL metric on the English...

متن کامل

A Clustering Approach for Unsupervised Chinese Coreference Resolution

Coreference resolution is the process of identifying expressions that refer to the same entity. This paper presents a clustering algorithm for unsupervised Chinese coreference resolution. We investigate why Chinese coreference is hard and demonstrate that techniques used in coreference resolution for English can be extended to Chinese. The proposed system exploits clustering as it has advantage...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007